AITopics | target phenomenon

Collaborating Authors

target phenomenon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Bean, Andrew M., Kearns, Ryan Othniel, Romanou, Angelika, Hafner, Franziska Sofia, Mayne, Harry, Batzner, Jan, Foroutan, Negar, Schmitz, Chris, Korgul, Karolina, Batra, Hunar, Deb, Oishi, Beharry, Emma, Emde, Cornelius, Foster, Thomas, Gausen, Anna, Grandury, María, Han, Simeng, Hofmann, Valentin, Ibrahim, Lujain, Kim, Hazel, Kirk, Hannah Rose, Lin, Fangru, Liu, Gabrielle Kaili-May, Luettgau, Lennart, Magomere, Jabez, Rystrøm, Jonathan, Sotnikova, Anna, Yang, Yushi, Zhao, Yilun, Bibi, Adel, Bosselut, Antoine, Clark, Ronald, Cohan, Arman, Foerster, Jakob, Gal, Yarin, Hale, Scott A., Raji, Inioluwa Deborah, Summerfield, Christopher, Torr, Philip H. S., Ududec, Cozmin, Rocher, Luc, Mahdi, Adam

arXiv.org Artificial IntelligenceNov-10-2025

Evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment. Reliably measuring abstract and complex phenomena such as 'safety' and 'robustness' requires strong construct validity, that is, having measures that represent what matters to the phenomenon. With a team of 29 expert reviewers, we conduct a systematic review of 445 LLM benchmarks from leading conferences in natural language processing and machine learning. Across the reviewed articles, we find patterns related to the measured phenomena, tasks, and scoring metrics which undermine the validity of the resulting claims. To address these shortcomings, we provide eight key recommendations and detailed actionable guidance to researchers and practitioners in developing LLM benchmarks.

benchmark, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.04703

Country:

Europe (0.93)
North America > United States (0.28)
North America > Mexico (0.28)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multi-Agent Vulcan: An Information-Driven Multi-Agent Path Finding Approach

Olkin, Jake, Parimi, Viraj, Williams, Brian

arXiv.org Artificial IntelligenceSep-19-2024

Scientists often search for phenomena of interest while exploring new environments. Autonomous vehicles are deployed to explore such areas where human-operated vehicles would be costly or dangerous. Online control of autonomous vehicles for information-gathering is called adaptive sampling and can be framed as a POMDP that uses information gain as its principal objective. While prior work focuses largely on single-agent scenarios, this paper confronts challenges unique to multi-agent adaptive sampling, such as avoiding redundant observations, preventing vehicle collision, and facilitating path planning under limited communication. We start with Multi-Agent Path Finding (MAPF) methods, which address collision avoidance by decomposing the MAPF problem into a series of single-agent path planning problems. We then present information-driven MAPF which addresses multi-agent information gain under limited communication. First, we introduce an admissible heuristic that relaxes mutual information gain to an additive function that can be evaluated as a set of independent single agent path planning problems. Second, we extend our approach to a distributed system that is robust to limited communication. When all agents are in range, the group plans jointly to maximize information. When some agents move out of range, communicating subgroups are formed and the subgroups plan independently. Since redundant observations are less likely when vehicles are far apart, this approach only incurs a small loss in information gain, resulting in an approach that gracefully transitions from full to partial communication. We evaluate our method against other adaptive sampling strategies across various scenarios, including real-world robotic applications. Our method was able to locate up to 200% more unique phenomena in certain scenarios, and each agent located its first unique phenomenon faster by up to 50%.

agent, information gain, scenario, (15 more...)

arXiv.org Artificial Intelligence

2409.13065

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Texas > Galveston Bay (0.04)
Atlantic Ocean > Gulf of Mexico > United States Gulf of Mexico > Galveston Bay (0.04)
Europe > Switzerland (0.04)

Genre:

Research Report (0.64)
Overview (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Add feedback

CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning

He, Jinwei, Lu, Feng

arXiv.org Artificial IntelligenceSep-9-2024

Large language models (LLMs) have been utilized in solving diverse reasoning tasks, encompassing common sense, arithmetic and deduction tasks. However, with difficulties of reversing thinking patterns and irrelevant premises, how to determine the authenticity of the cause in abductive logical reasoning remains underexplored. Inspired by hypothesis and verification method and identification of irrelevant information in human thinking process, we propose a new framework for LLMs abductive logical reasoning called CauseJudger (CJ), which identifies the authenticity of possible cause by transforming thinking from reverse to forward and removing irrelevant information. In addition, we construct an abductive logical reasoning dataset for decision task called CauseLogics, which contains 200,000 tasks of varying reasoning lengths. Our experiments show the efficiency of CJ with overall experiments and ablation experiments as well as case studies on our dataset and reconstructed public dataset. Notably, CJ's implementation is efficient, requiring only two calls to LLM. Its impact is profound: when using gpt-3.5, CJ achieves a maximum correctness improvement of 41% compared to Zero-Shot-CoT. Moreover, with gpt-4, CJ attains an accuracy exceeding 90% across all datasets.

dataset, information, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2409.05559

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Modelling Language

Grindrod, Jumbly

arXiv.org Artificial IntelligenceApr-15-2024

This paper argues that large language models have a valuable scientific role to play in serving as scientific models of a language. Linguistic study should not only be concerned with the cognitive processes behind linguistic competence, but also with language understood as an external, social entity. Once this is recognized, the value of large language models as scientific models becomes clear. This paper defends this position against a number of arguments to the effect that language models provide no linguistic insight. It also draws upon recent work in philosophy of science to show how large language models could serve as scientific models.

e-language, llm, target phenomenon, (17 more...)

arXiv.org Artificial Intelligence

2404.09579

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
(5 more...)

Genre: Research Report (0.70)

Industry:

Government (0.68)
Health & Medicine > Therapeutic Area (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Near-Optimal Active Learning of Multi-Output Gaussian Processes

Zhang, Yehong (National University of Singapore) | Hoang, Trong Nghia (National University of Singapore) | Low, Kian Hsiang (National University of Singapore) | Kankanhalli, Mohan (National University of Singapore)

AAAI ConferencesApr-19-2016

This paper addresses the problem of active learning of a multi-output Gaussian process (MOGP) model representing multiple types of coexisting correlated environmental phenomena. In contrast to existing works, our active learning problem involves selecting not just the most informative sampling locations to be observed but also the types of measurements at each selected location for minimizing the predictive uncertainty (i.e., posterior joint entropy) of a target phenomenon of interest given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling locations and selected observations when optimized. To resolve this issue, we first exploit a structure common to sparse MOGP models for deriving a novel active learning criterion. Then, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomial-time approximation algorithm that guarantees a constant-factor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on real-world datasets shows that our proposed approach outperforms existing algorithms for active learning of MOGP and single-output GP models.

artificial intelligence, machine learning, target phenomenon, (14 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Near-Optimal Active Learning of Multi-Output Gaussian Processes

Zhang, Yehong, Hoang, Trong Nghia, Low, Kian Hsiang, Kankanhalli, Mohan

arXiv.org Machine LearningNov-24-2015

algorithm, criterion, target phenomenon, (12 more...)

arXiv.org Machine Learning

1511.06891

Country:

Asia > Singapore (0.04)
Oceania > Australia > New South Wales (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.40)

Industry:

Energy (0.46)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Robust Spatio-Temporal Signal Recovery from Noisy Counts in Social Media

Xu, Jun-Ming, Bhargava, Aniruddha, Nowak, Robert, Zhu, Xiaojin

arXiv.org Artificial IntelligenceApr-10-2012

Many real-world phenomena can be represented by a spatio-temporal signal: where, when, and how much. Social media is a tantalizing data source for those who wish to monitor such signals. Unlike most prior work, we assume that the target phenomenon is known and we are given a method to count its occurrences in social media. However, counting is plagued by sample bias, incomplete data, and, paradoxically, data scarcity -- issues inadequately addressed by prior work. We formulate signal recovery as a Poisson point process estimation problem. We explicitly incorporate human population bias, time delays and spatial distortions, and spatio-temporal regularization into the model to address the noisy count issues. We present an efficient optimization algorithm and discuss its theoretical properties. We show that our model is more accurate than commonly-used baselines. Finally, we present a case study on wildlife roadkill monitoring, where our model produces qualitatively convincing results.

artificial intelligence, machine learning, tweet, (19 more...)

arXiv.org Artificial Intelligence

1204.2248

Country: North America > United States > New York (0.16)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

A Skeptic Embrace of Simulation

Funcke, Alexander (Stockholm University)

AAAI ConferencesNov-1-2011

Skeptics tend not to be the first to jump on the next band- wagon. In quite a few areas of science, simulations and Com- plex Adaptive Systems (CAS) has been the bandwagon in question. This paper intends to reach out to the skeptics and convince them to hop-on; take over the controls and make the wagon do a U-turn and aim for the established scientific theories. The argument is that simulation techniques, such as Agent- Based Modelling (ABM), may possibly be epistemically problematic as one sets out to strongly corroborate theories concerned with our overly complex real world. However, us- ing the same techniques to explore the robustness of (or to falsify) existing abstract and idealised mathematical models will be to be epistemically uncomplicated. This allows us to study the effects of reintroduction of real-world traits, such as autonomy and heterogeneity that was previously sacrificed for mathematical tractability.

artificial intelligence, economics, modeling & simulation, (17 more...)

AAAI Conferences

2011 AAAI Fall Symposium Series

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > Sweden > Stockholm > Stockholm (0.05)
North America > Canada > Ontario > Hamilton (0.04)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Add feedback